72 research outputs found

    Generative Knowledge Selection for Knowledge-Grounded Dialogues

    Full text link
    Knowledge selection is the key in knowledge-grounded dialogues (KGD), which aims to select an appropriate knowledge snippet to be used in the utterance based on dialogue history. Previous studies mainly employ the classification approach to classify each candidate snippet as "relevant" or "irrelevant" independently. However, such approaches neglect the interactions between snippets, leading to difficulties in inferring the meaning of snippets. Moreover, they lack modeling of the discourse structure of dialogue-knowledge interactions. We propose a simple yet effective generative approach for knowledge selection, called GenKS. GenKS learns to select snippets by generating their identifiers with a sequence-to-sequence model. GenKS therefore captures intra-knowledge interaction inherently through attention mechanisms. Meanwhile, we devise a hyperlink mechanism to model the dialogue-knowledge interactions explicitly. We conduct experiments on three benchmark datasets, and verify GenKS achieves the best results on both knowledge selection and response generation.Comment: Findings of EACL-2

    Entity Linking for Queries by Searching Wikipedia Sentences

    Full text link
    We present a simple yet effective approach for linking entities in queries. The key idea is to search sentences similar to a query from Wikipedia articles and directly use the human-annotated entities in the similar sentences as candidate entities for the query. Then, we employ a rich set of features, such as link-probability, context-matching, word embeddings, and relatedness among candidate entities as well as their related entities, to rank the candidates under a regression based framework. The advantages of our approach lie in two aspects, which contribute to the ranking process and final linking result. First, it can greatly reduce the number of candidate entities by filtering out irrelevant entities with the words in the query. Second, we can obtain the query sensitive prior probability in addition to the static link-probability derived from all Wikipedia articles. We conduct experiments on two benchmark datasets on entity linking for queries, namely the ERD14 dataset and the GERDAQ dataset. Experimental results show that our method outperforms state-of-the-art systems and yields 75.0% in F1 on the ERD14 dataset and 56.9% on the GERDAQ dataset

    A Modular Task-oriented Dialogue System Using a Neural Mixture-of-Experts

    Get PDF
    End-to-end Task-oriented Dialogue Systems (TDSs) have attracted a lot of attention for their superiority (e.g., in terms of global optimization) over pipeline modularized TDSs. Previous studies on end-to-end TDSs use a single-module model to generate responses for complex dialogue contexts. However, no model consistently outperforms the others in all cases. We propose a neural Modular Task-oriented Dialogue System(MTDS) framework, in which a few expert bots are combined to generate the response for a given dialogue context. MTDS consists of a chair bot and several expert bots. Each expert bot is specialized for a particular situation, e.g., one domain, one type of action of a system, etc. The chair bot coordinates multiple expert bots and adaptively selects an expert bot to generate the appropriate response. We further propose a Token-level Mixture-of-Expert (TokenMoE) model to implement MTDS, where the expert bots predict multiple tokens at each timestamp and the chair bot determines the final generated token by fully taking into consideration the outputs of all expert bots. Both the chair bot and the expert bots are jointly trained in an end-to-end fashion. To verify the effectiveness of TokenMoE, we carry out extensive experiments on a benchmark dataset. Compared with the baseline using a single-module model, our TokenMoE improves the performance by 8.1% of inform rate and 0.8% of success rate.Comment: Proceedings of the 2019 SIGIR Workshop WCIS: Workshop on Conversational Interaction System

    Towards Empathetic Dialogue Generation over Multi-type Knowledge

    Full text link
    Enabling the machines with empathetic abilities to provide context-consistent responses is crucial on both semantic and emotional levels. The task of empathetic dialogue generation is proposed to address this problem. However, lacking external knowledge makes it difficult to perceive implicit emotions from limited dialogue history. To address the above challenges, we propose to leverage multi-type knowledge, i.e, the commonsense knowledge and emotional lexicon, to explicitly understand and express emotions in empathetic dialogue generation. We first enrich the dialogue history by jointly interacting with two-type knowledge and construct an emotional context graph. Then we introduce a multi-type knowledge-aware context encoder to learn emotional context representations and distill emotional signals, which are the prerequisites to predicate emotions expressed in responses. Finally, we propose an emotional cross-attention mechanism to exploit the emotional dependencies between the emotional context graph and the target empathetic response. Conducted on a benchmark dataset, extensive experimental results show that our proposed framework outperforms state-of-the-art baselines in terms of automatic metrics and human evaluations.Comment: arXiv admin note: text overlap with arXiv:1911.0869

    Detecting and Classifying Malevolent Dialogue Responses: Taxonomy, Data and Methodology

    Get PDF
    Conversational interfaces are increasingly popular as a way of connecting people to information. Corpus-based conversational interfaces are able to generate more diverse and natural responses than template-based or retrieval-based agents. With their increased generative capacity of corpusbased conversational agents comes the need to classify and filter out malevolent responses that are inappropriate in terms of content and dialogue acts. Previous studies on the topic of recognizing and classifying inappropriate content are mostly focused on a certain category of malevolence or on single sentences instead of an entire dialogue. In this paper, we define the task of Malevolent Dialogue Response Detection and Classification (MDRDC). We make three contributions to advance research on this task. First, we present a Hierarchical Malevolent Dialogue Taxonomy (HMDT). Second, we create a labelled multi-turn dialogue dataset and formulate the MDRDC task as a hierarchical classification task over this taxonomy. Third, we apply stateof-the-art text classification methods to the MDRDC task and report on extensive experiments aimed at assessing the performance of these approaches.Comment: under review at JASIS

    Improving Background Based Conversation with Context-aware Knowledge Pre-selection

    Get PDF
    Background Based Conversations (BBCs) have been developed to make dialogue systems generate more informative and natural responses by leveraging background knowledge. Existing methods for BBCs can be grouped into two categories: extraction-based methods and generation-based methods. The former extract spans frombackground material as responses that are not necessarily natural. The latter generate responses thatare natural but not necessarily effective in leveraging background knowledge. In this paper, we focus on generation-based methods and propose a model, namely Context-aware Knowledge Pre-selection (CaKe), which introduces a pre-selection process that uses dynamic bi-directional attention to improve knowledge selection by using the utterance history context as prior information to select the most relevant background material. Experimental results show that our model is superior to current state-of-the-art baselines, indicating that it benefits from the pre-selection process, thus improving in-formativeness and fluency.Comment: SCAI 2019 workshop pape

    Improving End-to-End Sequential Recommendations with Intent-aware Diversification

    Get PDF
    Sequential Recommendation (SRs) that capture users' dynamic intents by modeling user sequential behaviors can recommend closely accurate products to users. Previous work on SRs is mostly focused on optimizing the recommendation accuracy, often ignoring the recommendation diversity, even though it is an important criterion for evaluating the recommendation performance. Most existing methods for improving the diversity of recommendations are not ideally applicable for SRs because they assume that user intents are static and rely on post-processing the list of recommendations to promote diversity. We consider both recommendation accuracy and diversity for SRs by proposing an end-to-end neural model, called Intent-aware Diversified Sequential Recommendation (IDSR). Specifically, we introduce an Implicit Intent Mining module (IIM) into SRs to capture different user intents reflected in user behavior sequences. Then, we design an Intent-aware Diversity Promoting (IDP) loss to supervise the learning of the IIM module and force the model to take recommendation diversity into consideration during training. Extensive experiments on two benchmark datasets show that IDSR significantly outperforms state-of-the-art methods in terms of recommendation diversity while yielding comparable or superior recommendation accuracy

    RADE: Reference-Assisted Dialogue Evaluation for Open-Domain Dialogue

    Full text link
    Evaluating open-domain dialogue systems is challenging for reasons such as the one-to-many problem, i.e., many appropriate responses other than just the golden response. As of now, automatic evaluation methods need better consistency with humans, while reliable human evaluation can be time- and cost-intensive. To this end, we propose the Reference-Assisted Dialogue Evaluation (RADE) approach under the multi-task learning framework, which leverages the pre-created utterance as reference other than the gold response to relief the one-to-many problem. Specifically, RADE explicitly compares reference and the candidate response to predict their overall scores. Moreover, an auxiliary response generation task enhances prediction via a shared encoder. To support RADE, we extend three datasets with additional rated responses other than just a golden response by human annotation. Experiments on our three datasets and two existing benchmarks demonstrate the effectiveness of our method, where Pearson, Spearman, and Kendall correlations with human evaluation outperform state-of-the-art baselines.Comment: 19 pages, Accepted by ACL2023 main conferenc

    Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agent

    Full text link
    Large Language Models (LLMs) have demonstrated a remarkable ability to generalize zero-shot to various language-related tasks. This paper focuses on the study of exploring generative LLMs such as ChatGPT and GPT-4 for relevance ranking in Information Retrieval (IR). Surprisingly, our experiments reveal that properly instructed ChatGPT and GPT-4 can deliver competitive, even superior results than supervised methods on popular IR benchmarks. Notably, GPT-4 outperforms the fully fine-tuned monoT5-3B on MS MARCO by an average of 2.7 nDCG on TREC datasets, an average of 2.3 nDCG on eight BEIR datasets, and an average of 2.7 nDCG on ten low-resource languages Mr.TyDi. Subsequently, we delve into the potential for distilling the ranking capabilities of ChatGPT into a specialized model. Our small specialized model that trained on 10K ChatGPT generated data outperforms monoT5 trained on 400K annotated MS MARCO data on BEIR. The code to reproduce our results is available at www.github.com/sunnweiwei/RankGP
    • …
    corecore